Student Data:

My work:

I have chosen to work on case 3, but had to complete Case 2 beforehand. Unfortunately, the time wasn't enough to compute all of the features for Case 3. Therefore, after I have applied a Lasso regression I have also tested the case with a random forest regressor, which performed better in my case (probably because I haven't computed the data exactly like in the articles).

In that sense, on that question in the task How can the results of the article be expanded/developed/upgraded? I would suggest implementing random forest or decision three.

Case 2

Setting up environment and loading the data

Formating data

Fixing NAs for Romania so we can keep it for the description part

Standardizing

Descriptive analysis

Factor Analysis

Case 3

Building poor_jq

Pressure

emo_demanding

conflicts

phys_demanding, uncomfortable

security (health risk reduced), freedom (will asume reverse values), skills, support, recognition

Will swap wq024 with wq010_ due to redused number of NAs

Employment

Too much NAs, switching to RE module

Socio-demographic factors

Health

Will take the current version of health df as I do not have time to compile all of them

Modelling

MR1

MR2

Trying with Random Forest

MR1

MR2